78 research outputs found

    Multiple Ontologies for Integrating Complex Phenotype Datasets

    Get PDF
    There has been an emergence of multiple large scale phenotyping projects in the rat model organism community as well as renewed interest in the ongoing phenotype data generated by thousands of researchers using hundreds of rat strains worldwide. Unfortunately, this data is scattered and is neither described nor formatted in a standardized manner. A system to integrate complex phenotype data from multiple sources and facilitate data mining and analysis is being developed using multiple ontologies.

*Introduction*
The potential value of integrating phenotype data from multiple sources (different laboratories, varying techniques to measure similar phenotypes, multiple strains) is enormous. Presented here is a data integration system for complex phenotype data from both large-scale and individual experiments and the taxonomy and ontologies that provide the backbone of this format. RGD along with Mouse Genome Informatics (MGI) (Blake et al, 2009) and the Animal QTL Database (Hu and Reecy, 2007) is developing a Vertebrate Trait Ontology to represent morphological states and physiological processes to be used to annotate quantitative trait loci (QTL) and other data. RGD has also used the Mammalian Phenotype Ontology (Smith et al, 2005) for several years to indicate the relationship of genomic elements to abnormal phenotypes. The Vertebrate Trait Ontology represents what is being assessed, and the Mammalian Phenotype Ontology represents the conclusion that was made. The system presented here represents what was done to measure the trait in order to reach the conclusion. Because of the close relationship among these ontologies, care is being taken to ensure compatibility and similarity in structure using the phenotype properties in the Phenotypic Quality Ontology (PATO) for guidance. ("http://www.bioontology.org/wiki/index.php/PATO:Main_Page":http://www.bioontology.org/wiki/index.php/PATO:Main_Page) 

*Data Format and Ontologies*
Standardization of data types and relationships used to define the phenotype experiment and resulting data, and the ontologies to be used to standardize descriptive fields are being developed. For phenotype data, the major informational components include Researcher, Study, Experiment, Sample, Experimental Conditions and Clinical Measurement. A Rat Strain Taxonomy has been developed to standardize this information and provide the relationships among strains to allow investigators to retrieve and analyze phenotype data for strains that are related genetically. Two important aspects of a phenotype measurement include 1) what was measured and 2) how it was measured. The Clinical Measurement Ontology and the Measurement Method Ontology are being developed to standardize this information. In addition an Experimental Conditions ontology is under construction to allow integration of data measured under various conditions.

*Pilot Study Results*
Cardiovascular and biochemistry phenotype data from two major datasets have been integrated using the Rat Strain Taxonomy and the three phenotype related ontologies. A prototype data mining tool ("http://rgd.mcw.edu/rgdweb/":http://rgd.mcw.edu/rgdweb/) has also been developed that provides the user with options to begin a search with strains or any of the ontologies and make subsequent filter choices from the other ontologies. Choices presented to the user are restricted to those for which data is available and query tracking functions are provided to alert the user to the number of results being returned and the query choices made.

*References*
Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE; Mouse Genome Database Group, 2009 _Nucleic Acids Res_. Jan;37:D712-9.

HuZL, Reecy JM, Animal QTLdb: beyond a repository. A public platform for QTL comparisons and integration with diverse types of structural genomic information, 2007, _Mamm Genome_, Jan;18(1):1-4.

Smith CL, Goldsmith CA, Eppig JT. The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information, _Genome Biol_. 2005 6(1):R7.
&#xa

    PhenoGeneRanker: A Tool for Gene Prioritization Using Complete Multiplex Heterogeneous Networks

    Get PDF
    Uncovering genotype-phenotype relationships is a fundamental challenge in genomics. Gene prioritization is an important step for this endeavor to make a short manageable list from a list of thousands of genes coming from high-throughput studies. Network propagation methods are promising and state of the art methods for gene prioritization based on the premise that functionally-related genes tend to be close to each other in the biological networks. In this study, we present PhenoGeneRanker, an improved version of a recently developed network propagation method called Random Walk with Restart on Multiplex Heterogeneous Networks (RWR-MH). PhenoGeneRanker allows multi-layer gene and disease networks. It also calculates empirical p-values of gene ranking using random stratified sampling of genes based on their connectivity degree in the network. We ran PhenoGeneRanker using multi-omics datasets of rice to effectively prioritize the cold tolerance-related genes. We observed that top genes selected by PhenoGeneRanker were enriched in cold tolerance-related Gene Ontology (GO) terms whereas bottom ranked genes were enriched in general GO terms only. We also observed that top-ranked genes exhibited significant p-values suggesting that their rankings were independent of their degree in the network

    Rat Strain Ontology: structured controlled vocabulary designed to facilitate access to strain data at RGD

    Get PDF
    BACKGROUND: The Rat Genome Database (RGD) ( http://rgd.mcw.edu/) is the premier site for comprehensive data on the different strains of the laboratory rat (Rattus norvegicus). The strain data are collected from various publications, direct submissions from individual researchers, and rat providers worldwide. Rat strain, substrain designation and nomenclature follow the Guidelines for Nomenclature of Mouse and Rat Strains, instituted by the International Committee on Standardized Genetic Nomenclature for Mice. While symbols and names aid in identifying strains correctly, the flat nature of this information prohibits easy search and retrieval, as well as other data mining functions. In order to improve these functionalities, particularly in ontology-based tools, the Rat Strain Ontology (RS) was developed. RESULTS: The Rat Strain Ontology (RS) reflects the breeding history, parental background, and genetic manipulation of rat strains. This controlled vocabulary organizes strains by type: inbred, outbred, chromosome altered, congenic, mutant and so on. In addition, under the chromosome altered category, strains are organized by chromosome, and further by type of manipulations, such as mutant or congenic. This allows users to easily retrieve strains of interest with modifications in specific genomic regions. The ontology was developed using the Open Biological and Biomedical Ontology (OBO) file format, and is organized on the Directed Acyclic Graph (DAG) structure. Rat Strain Ontology IDs are included as part of the strain report (RS: ######). CONCLUSIONS: As rat researchers are often unaware of the number of substrains or altered strains within a breeding line, this vocabulary now provides an easy way to retrieve all substrains and accompanying information. Its usefulness is particularly evident in tools such as the PhenoMiner at RGD, where users can now easily retrieve phenotype measurement data for related strains, strains with similar backgrounds or those with similar introgressed regions. This controlled vocabulary also allows better retrieval and filtering for QTLs and in genomic tools such as the GViewer. The Rat Strain Ontology has been incorporated into the RGD Ontology Browser ( http://rgd.mcw.edu/rgdweb/ontology/view.html?acc_id=RS:0000457#s) and is available through the National Center for Biomedical Ontology ( http://bioportal.bioontology.org/ontologies/1150) or the RGD ftp site ( ftp://rgd.mcw.edu/pub/ontology/rat_strain/)

    Disease Ontology: improving and unifying disease annotations across species.

    Get PDF
    Model organisms are vital to uncovering the mechanisms of human disease and developing new therapeutic tools. Researchers collecting and integrating relevant model organism and/or human data often apply disparate terminologies (vocabularies and ontologies), making comparisons and inferences difficult. A unified disease ontology is required that connects data annotated using diverse disease terminologies, and in which the terminology relationships are continuously maintained. The Mouse Genome Database (MGD, http://www.informatics.jax.org), Rat Genome Database (RGD, http://rgd.mcw.edu) and Disease Ontology (DO, http://www.disease-ontology.org) projects are collaborating to augment DO, aligning and incorporating disease terms used by MGD and RGD, and improving DO as a tool for unifying disease annotations across species. Coordinated assessment of MGD\u27s and RGD\u27s disease term annotations identified new terms that enhance DO\u27s representation of human diseases. Expansion of DO term content and cross-references to clinical vocabularies (e.g. OMIM, ORDO, MeSH) has enriched the DO\u27s domain coverage and utility for annotating many types of data generated from experimental and clinical investigations. The extension of anatomy-based DO classification structure of disease improves accessibility of terms and facilitates application of DO for computational research. A consistent representation of disease associations across data types from cellular to whole organism, generated from clinical and model organism studies, will promote the integration, mining and comparative analysis of these data. The coordinated enrichment of the DO and adoption of DO by MGD and RGD demonstrates DO\u27s usability across human data, MGD, RGD and the rest of the model organism database community. Dis Model Mech 2018 Mar 12;11(3):dmm032839

    Automated generation of gene summaries at the Alliance of Genome Resources

    Get PDF
    Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages

    Automated generation of gene summaries at the Alliance of Genome Resources.

    Get PDF
    Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages

    Using Multiple Ontologies to Integrate Complex Biological Data

    Get PDF
    The strength of the rat as a model organism lies in its utility in pharmacology, biochemistry and physiology research. Data resulting from such studies is difficult to represent in databases and the creation of user-friendly data mining tools has proved difficult. The Rat Genome Database has developed a comprehensive ontology-based data structure and annotation system to integrate physiological data along with environmental and experimental factors, as well as genetic and genomic information. RGD uses multiple ontologies to integrate complex biological information from the molecular level to the whole organism, and to develop data mining and presentation tools. This approach allows RGD to indicate not only the phenotypes seen in a strain but also the specific values under each diet and atmospheric condition, as well as gender differences. Harnessing the power of ontologies in this way allows the user to gather and filter data in a customized fashion, so that a researcher can retrieve all phenotype readings for which a high hypoxia is a factor. Utilizing the same data structure for expression data, pathways and biological processes, RGD will provide a comprehensive research platform which allows users to investigate the conditions under which biological processes are altered and to elucidate the mechanisms of disease

    The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases.

    Get PDF
    Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified look and feel, the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient knowledge commons for model organisms using shared, modular infrastructure
    • …
    corecore